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Abstract 

Many steganographic techniques Q 111 61 lH were 
proposed for hiding secret message inside images, 
the simplest of them being the LSB data hiding [6] 
El HI llini, El- In this paper, we suggest a 
novel data hiding technique in an Html Web page 
lfT2l and also propose some simple techniques to ex- 
tend the embedding technique to source codes writ- 
ten in any programming language (both case insen- 
sitive like html, pascal and case sensitive languages 
like C, C-i~i-, Java) - an extension to |12|. We ba- 
sically try to exploit the case-redundancy in case- 
insensitive language, while we try hiding data with 
minimal changes int the source code (almost not rais- 
ing suspicion). Html Tags are case insensitive and 
hence an alphabet in lowercase and one in upper- 
case present inside an html tag are interpreted in the 
same manner by the browser, i.e., change in case in 
an web page is imperceptible to the browser. We 
first exploit this redundancy and use it to embed se- 
cret data inside an web page, with no changes visi- 



ble to the user of the web page, so that he can not 
even suspect about the data hiding. The embedded 
data can be recovered by viewing the source of the 
html page. This technique can easily be extended 
to embed secret message inside any piece of source- 
code where the standard interpreter of that language 
is case-insensitive. For case-sensitive programming 
languages we do minimal changes in the source code 
(e.g., add an extra character in the token identified by 
the lexical analyser) without violating the lexical and 
syntactic notation for that language) and try to make 
the change almost imperceptible. 

1 Introduction 

Steganography is another name of hiding secret data 
in cover medium, thereby ensuring imperceptibility 
and exploiting redundancies in representation of the 
cover medium. For instance, in case of LSB data 
hiding the property that the cover image visual rep- 
resentation is least affected (almost unaffected) by 
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the change of the LSB of any pixel, is used and 
this redundancy (or cover image oblivious to change 
in LSB) is exploited to embed secret data in LSB 
lITTll . Also, some decomposition techniques were 
proposed to enhance the LSB data hiding technique 
by increasing the number of bitplanes fS^ fT^ fSl f9l . 
Some techniques for hiding data in executables are 
already proposed (e.g.. Shin et al |4|). In this paper 
we introduce a very simple technique to hide secret 
message bits inside source codes as well, as an exten- 
sion of [12J. We describe our steganographic tech- 
nique by hiding inside html source as cover text, but 
this can be extended to any case-insensitive language 
source codes like Basic, PASCAL or FORTRAN. 

2 Hiding Data inside Html 

2.1 Exploiting the Case-Insensitivity 

As we know. Html Tags are basically directives to the 
browser and they carry information regarding how to 
structure and display the data on a web page. They 
are not case sensitive, so tags in either case (or mixed 
case) are interpreted by the browser in the same man- 
ner (e.g., "< head >" and "< HEAD >" refers 
to the same thing). Hence, there is a redundancy in 
terms of case-insensitivity and we shall exploit this 
redundancy. To embed secret message bits into html, 
if the cases of the tag alphabets in html cover text are 
accordingly manipulated, then this tampering of the 
cover text will be ignored by the browser and hence 
it will be imperceptible to the user, since there will 
not be any visible difference in the web page, hence 
there will not be any suspect for it as well. Also, 
when the web page is displayed in the browser, only 
the text contents are displayed, not the tags (those 
can only be seen when the user does 'view source'). 
Hence, the secret messages will be kind of hidden to 
user, since they will have no effect on the page dis- 
played by the browser. In other words, browser will 
help us hiding the data, by being indifferent to cases 
of the html tags, but we shall use those as key places 
for hiding data. 

Since only the portion of cover text inside the html 
tags will be used (and possibly will undergo a case- 



conversion) for hiding secret message bits and we are 
not going to tamper the html text data (outside the 
tags) that are going to be displayed by the browser 
as web page (this html cover text is analogical to 
the cover image, when thought in terms of stegano- 
graphic techniques in images |D|2l|3l), the user will 
not have any reason to suspect about hidden data in 
text. We shall only change the case of every charac- 
ter within these Html tags (elements) in accordance 
with the secret message bits that we want to hide in- 
side the html source. 

As described in |[T2]| . if we think of the browser in- 
terpreter as a function, : S* — )• S* we see that it 
is non-injective, i.e., not one to one, since = 
fsiy) whenever x e {'A . . . 'Z'}, y G {'a' . . . 'z'} 
and Upper case{y) = x. The extraction process of 
the hidden message will also be very simple, one 
needs to just do 'view source' and observe the case- 
patterns of the text within tags and can readily extract 
the secret message (and see the unseen), while the 
others will not know anything. So, both the embed- 
ding and extraction of secret message bits become 
very simple. 

As it can be guessed, we can not hide arbitrary long 
data inside a given cover text. More precisely, the 
length (in bits) of the secret message to be hidden 
inside the html cover-text will be upper-limited by 
the sum of size of text inside html tags (here we 
don't consider attribute values for data embedding. 
In case we consider attribute values for data embed- 
ding, we need to be more careful, since for some 
tags we should think of case-sensitivity, e.g. <A 
HREF="link.html">, since link file name may be 
case-sensitive on some systems, whereas, attributes 
such as <h2 align="center"> is safe). If less num- 
bers of bits to be embedded, we can embed the in- 
formation inside Header Tag specifying the length of 
embedded data (e.g. '<Header 25 >' if the length 
of secret data to be embedded is 25 bits) that will not 
be shown in the browser (optionally we can encrypt 
this integer value with some private key). In order to 
guarantee robustness of this very simple algorithm 
one may use some simple encryption on the data to 
be embedded. 
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<html> 

<head> 

<title> 
Hello World 

c/titte> 
<ytiead> 
<body> 

>:h1>Hl<:/tl1> 

cp>Hello woi1d</p> 

<yhtml> 



Bit Stream to be 
embedded in the 
Web Page 



1011010000110 
1100110100101 
100111000010 



<HtML> 
<hEad 3S> 
<tiTLe> 
Hello World 

<maE> 

<IH&M> 
<bOdY> 

<H1>HI<;/h1> 

<p>Hello world</P> 
</BOe^> 
</httvil> 



Input 



Total number of bits to embed = 38 



Output 



c h t rr I i Input Cover Data 



10 11 Secret Data to be embedded 



V V V V 

< H t M L > Output Stego Data 



Fig. 1. Illustration of how the Html data hiding works 



2.2 The Algorithm for Hiding Data 

As described in lil2ll . the algorithm for embedding 
the secret message inside the html cover text is very 
simple and straight-forward. First, we need to sepa- 
rate out the characters from the cover text that will 
be candidates for embedding, these are the case- 
insensitive text characters inside Html tags. Figure 
2 shows a very simplified automata for this purpose. 
Also, let us define the following functions before de- 
scribing the algorithm: 



• / : S* ^ S* as: 

ToLower{c) cG{A'..'Z'} 
c otherwise 
where ToLower{c) = c-\- d 



lie) 



-u : S* ^ S* as: 

ToUpper{e) e e {'a'..'z'} 

c otherwise 
where ToUpper{c) = c — d 



u{c) 



inside the Html Tags. If ciC2 . . . Cn denotes the se- 
quence of characters inside the html tags in cover 
text (input html). A character Cj is a candidate for 
hiding a secret message bit iff it is an alphabet. If we 
want to hide the j^^ secret message bit bj inside the 
cover text character a, the corresponding stego-text 
will be defined by the following function f stego- 

\/c^ £ {'a'..'z'}u{A'..'Z'},i.e. if IsAlphabetfe) 
is true, 



stego\ 



l{Ci) 

u{ci) 



Hence, we have the following: 
a G {'a'..'z'}U{'A..'Z'} 



• Hereri = A' - 'a'. 

The ascii value of A' = 65 and the ascii value 
of 'a' = 97, with a difference d = 32. 

It's easy to see that if the domain S* = 
{'a'..'z'}U{A'..'Z'},then 

I : {A'..'Z'} ^ {'a'..'z'} and u : {'a'..'z'} ^ 
{'A..'Z'}, implies that /(.) = u{.) = S* - l{.). 

Now, proceeding as in |[T2]| . we want to embed se- 
cret data bits bib2--bk inside the case-insensitive text tract the number of bits (k) embedded into this page 



fstego{Ci) — l(^Ci).bj 

+u{ci).bj, Vz (1) 



The number of bits (k) of the secret message embed- 
ded into the html cover text must also be embedded 
inside the html (e.g., in Header element). The fig- 
ures 1 , 2 and the algorithm [T] together explain this 
data hiding algorithm. 

2.3 The Algorithm for Hidden Data Extrac- 
tion 

Again, proceeding as in fT2\, the algorithm for ex- 
traction of the secret message bits will be even sim- 
pler. As in the embedding process, we must first sep- 
arate out the candidate text (exactly the text within 
the Html tags) that were chosen in the earlier step for 
embedding secret message bits. Also, we must ex- 
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Browierfimcriou fs for ri'\ts imiiie Html Tag 



Skip Attributes 




DFA to extract tlic text ttisuk Html Tags 



Embedding Algorithm 



riCj...f, ■ characters inside the html tags in 
cover text (input html) 

iji, ...b^m Sequence of secret message bits to 
emt>ed inside the cover text 



if c.efa' 'z'}u{'A 'Z'). 

else skip 

(not a candidate for embedding secret bit b,] 



Extraction Algonthm 




..d, - characters inside the html tags 
in stego text 


H 


II rf,^ j-a-...Y| ' 
[1 ii\:K...-l-\\ 


\ 



Fig. 2. Basic block-diagram for the Html data-hiding technique 



(e.g., from the header element). In order to find out 
the stego-text, one has to use 'view source'. 
Now, we have di = fstego{ci), Vi G {1,2,..., n}. 
If di G {'a'..'z'} U {'A'..'Z'} i.e., an alphabet, then 
only it is a candidate for decoding and to extract bi 
from di, we use the following logic: 

Jo G{'a'...'z'} \ 
\ 1 G{'A'...'Z'} / 

Repeat the above algorithm Vi < /c, to extract all the 
hidden bits. 

2.4 Experimental Results 

As in [12], we obtained the following results: 

The figures 3, 4 and 5 (as in |[T2|) show an example 
of how our method works, while Figure 6 shows the 
comparison of the histogram of the cover html and 
stego html in terms of the (ascii) character frequen- 
cies. Classical image hiding techniques like LSB 
data hiding technique always introduce some (visi- 



ble) distortion m [TOl in the stego image (that can be 
reduced using techniques ||6l [71 [8j 19]), but our data 
hiding technique in html is novel in the sense that it 
introduces no visible distortion in stego text at all. 

3 Hiding Data inside Other Source 
Codes 

3.1 Case-insensitive Programming Lan- 
guage Sources 

In order to hide data in the source codes in languages 
for which the letter-case is ignored (e.g. PASCAL), 
we can follow similar approaches as before. But we 
must ensure the following things: 

• While embedding secret messages we must not 
change anything in the source code, so that the 
output of the program (when run) changes. 

• Should not embed inside variable values, for in- 
stance string literals. 
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<btml> 
<hssd> 

<ttcle> Test Html jSte^snofli^p^y - 5«einy tae Unseen) </tit+e> 
<bC>dy> 

<tr> 

<tS>ir.t!E; </t(J> 

<td ;cl«r.sr.»"2"> <r.l> Enbeddlng Secret MesssQ^, c»a you see **at it l9</Sl></td> 

</tr> 
<tr> 

<td r:v5f sr."'2" slij'-'left" '-«li;r*"top"> 
<F> Image Piocejsing Tec'anlirie3<bi> 
Histogram eg"aali2ation<fci> 
SileveJ Thie!holding<br> 
Cjitjnal Thie!holding<br> 
LOS FiJtei<br> 
riFT DCT DST<br> 
Segiii(entation<br> 
JPEG ConpreiBioE </p></td> 
<td><i"-->The Lena Image - we don't embed secret message inside Lena ^ ' ' </r.lx/td> 
<td il_;"="center" "aii;".="middle"> <im5 ?: ;*"iicages/lera.bmF" il:=''lena" ■■i:-t!'=''25€ " !"ei;;.:="12S" b;rde:"**0*> 
</td> 
</tr> 
<t!:> 

<td ali;'»"tig!it' -!li:r*"bcttDm"> <ing frr^'lmages/lena.biBp" »l-»"lena" --iJt'-^'SSE" :--ir:-:*"12!* bcr!ler"*0*> 

</td> 

<td> <p>Lena is an image that is used to test many of the classical image processing 

techniq"ies and algorithms including histogram e(juali2ation, optimal thresholding 
<h>LS3 data hiding</b> 

[LSB data hiding technique changes the L53 s of the image data to hide secret message) ; 
<b>3teganQgraphy</b> 
(Steganpgraphy is seeing the unseen) 
<b>Digital Watermarking<:/b> 

(But here we do not embed inside the image) . 

Instead we use case insensitivity of html tags to embed our secret message. <yp></td> 

</tr> 
<tr> 

<td ;il5par.**3'^ ali^i**left" ^'ali7r.=''tcp''>Embed secret message inside 

<b>coYer teit<,/b> to get 
<b>stego test<:/b> and be happy since both the stegp html and the cover html loot 

exactly identical in the browser. No one can even guess that there is some 

secret message already embedded mside it.</td> 

</tr> 
<tr> 

<td ;ilspan**3'^ aliji^'left" ^■ftlig:r.="tcp'^>Finally, extraction of the secret message is also 
<b>ea3Y</b> ijast you need to do] 

<b>view source</b> (and eitract the message] . Just checking the cases of the html tags reve*lJ 
the embeded message. Can mate this sili?>le technique even more robust by using some 
simple enciyption technigue in t(i(lieios</t(J> 

</t!:> 
</table> 

<p>LJt: <? ii"ef*"index. htra'^>3teganGgraphy exait^les</a></p> 

</bC)dy> 

</htral> 



Cover Html Source 



Fig. 3. Cover Html source before embedding the secret message 
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<tIIIE> Tut litBil {Suganogripby - SiKiag z^m DnsMn} </TIIlE> 
</!UlJ> 

<b=iY> 




<cJl> 

<TD Ccl*F«::-"2*> <R1> I^mtHLsq Scent Ucangcg cu ^Q-i iKC wbac it, lj</tlK/tl> 

<Kt> 

OcciMl TSiesRolill.lMKSt> 

DfT DCT EVr<K> 
Se^»Et«ti9iLCbR> 
■rfli cesBprsttion. </(></Ta> 
f.;>cH:.>Ihe Itn* Imtge - w aoo'c encea secrei iw9»g€ ic«ile Let*! ! !</MK/tt> 

«d *i:0-*aiiEaT'' v>li»-*uddl*'> ■^•ua«u/lum.b^* ;C:-^iu' vldTh->2S«* BlSR-'Ilt' facz3Br-*(ry 

■</td> 
<ytr> 

<tE> 

tta iiijS-'riinE" ".■•;:ij.'i-"i>stE'»''> <jld ;»;-*jj!»5«s/lea»,bs5!* •i--"'jeas' r-i3^--"lj(' fcc;j«r-"0*> 

<^($> <t>Lena ia u llug« thai: la us«d to tesc uev of t&« dftssicil 1JU7« pToc«aBiDf 

»chLlque3 «cd AlgaTitbu laclmliBg bajt:D^cm BquBliEHleB, eptioal Efaxcs^aldiB^ 

(LJB i3*5* Siilio^r EtetcKiue sinBtfei cte :.33 s ef ^Se usiSfe iHt» w MiSe secret k»h«Vi 

(Stef ■nograpt'-V 19 9?ein9 the uckcb] 

I fan tictT ve da coc «ife*d Insllc chr ijugtl , 

[M»td w ii« c*it lejCBiicivity cC ecu ctf* to tabta eui ttenc «it»«^.<y'ii><.''iM> 

<tr> 

<rd C3lrpafrn"9" Ali^=«"l«£G" 1 . -"t cj:" >CM3«1 ncnc OHiags vesicle 

<t>nw ctxc</t> tan 6* ttccy iicce bocn lite ictsc Kutl 4Ed tie s?v«r Kcal lost 
eaac-^IV idec^ical m ii.^c trovHr. E^c cna can ctiiL ^ar* i* aon 

9«crtt KS9a^ already enfccdded ita^de ic.</^d> 

</tr> 

na e:i»F«r""3" •liT5""l*fE" •'a:iTi""tcp'>ri!l*lly, eisrteliaa ol tte je:ret nessjsi is ilio 
<b>«aavc/b> {ji&ar VQ^J caed za do) 

<b>vieH amirce</b> i»cA ex'ra^t t!ie jaFssage) . Just cbecicinQ tbr casta of ch« hiiml i-aga rewAls 
the u^dcd »Baagr. Caa uJc* tSaia au^la techai^uc mo acze robust by uain? sqh 
allele eecr^ptioB ctc^inique In ad4ition<^td> 

</Cf> 

<f>%:'.: <M tr«f>*u<u.tKii!*>Stc«na«n(ib7 wutOuc/sx/B* 

<Jt:Jy> 



Stego Html Source 
Secret Message eiiibeilUeil 

"Copyiright @ Sandipan Dey" 

EmbejiJeU BitStieam (length 200) 

0100001 101 101 11 101 11000001 11 100101 11001001 10100101 10011 101 101000D111010D0010000001000000001000000101 
0011011000010110111001100100011010010111000001100001011011100010000001000100011001010111100100000000 



Fig. 4. Stego Html source after embedding the secret message 
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C Test Html (Stegdnography Seeing the Unseen) Windows Internet Explorer 



DEB 



Fife ^cSt View FgygrtKlW^JS* 



: Favorites j$ .n.;;p--f,5.^ Sire? - HE* ^ Nr ^f^Monsy ^ M5N Entertainment *lt M5NBC 4? Free Hotmail Live 5earch Trai^fli: ■ ||j M5rj 5lide5how * 

J Tsit Htinl (aeganography - 5a^g the tJnsean) I Sl ' l!) ^ ' E«Oi ♦ S^^y * Tfiph * 



Embedding Secret Message, can you see what it is 



Tedmiques 
Histogfam 

Thresbddkig 
Optknai 
Tb-eshddinE 
LOG Fikei 
DFT DCT 
DWT 

Segment atiofn 
JPEG 

Compressioa 




Lens u » imi^e thi^ 13 used to te^ smy of the classKiJ isage processiig 

techniques mA algondmis inclhidkig histograui equalizatian, optbiaal thresholdkig 
LSB data faldiDg i]TSB data liking technique changes the LSB s of the image 
data to bide secret message): Stegaaograpty (Steganogfaf^y is seeing ^e 
imseeH) Difital T^'alte^^la^kms (Bi* here ive do not embed inside the image). 
Instead we use case insHisjEi\^' of hloi tags to embed our secret message. 



Embed secret message inside cin tr leil to get stp^ teit and be b^ypv since both the stego html and the cmra btai took exactly identical in the browser. 
No 4Mie can even guess that there is some secret message ^ead>" embedded inside ir. 



Fiasi^^ esti action of the secret message is ^o easy (just vou need to do) ^"iew soarcc (and extract the mes^gc). Jtist checking the cases of the htnJ tags 
revc^ tbc' embeded message. Can taskjt this simple technique evaa more robust by using som^ sii^jlle mcr^ption technique in adtfeion 



j' My Coraputer 



Fig. 5. Cover k. Stego Html 















































■1 




nt 


1 







ASCII Va*LW« 'A' 'T 'a' 'i' 



Fig. 6. Cover vs Stego Html Histogram 
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DFA for an IDENTIFIER 
Regular Expression: [a-zA-Z_J[a-zA-Z0-9_J+ 



Fig. 7. DFA for accepting IDENTIFIER 



Algorithm 1 Algorithm to Hide Data inside Html 
1: Search for all the html tags present in the 
html cover text and extract all the characters 
C1C2 . . . c„ from inside those tags using the DFA 
described in the figure 2. 
2: Embed the secret message length k inside html 
header in the stego text. 



3 




4 


for a G HTM LT AGS, i = 1 . . . n do 


5 


if a E {'a' ...'z'}u{'A ...'Z'jthen 


6 


fstego{Ci) = l{Ci).hj +u{Ci).bj. 


7 


3^3 + 1- 


8 


else 


9 


fstegoipi) — Q- 


10 


end if 


11 


if j == k then 


12 


break. 


13 


end if 


14 


end for 



• Should store the total number of bits embedded 
inside the source code somewhere, with or with- 
out encryption. 

There can be a few very simple ways of embedding 
and depending upon where to embed secret data bits 
there can be a few variants accordingly: 

• Embed aggressively at every possible places 



(except possibly inside variable values, con- 
stants or string literals), in every keyword and 
identifier. 

• do not use all the characters of a candidate word 
for embedding, instead only use the first charac- 
ter (change the case depending upon the next se- 
cret bit to be embedded, keeping all other char- 
acters unchanged). 

• Only embed inside the keywords. 

• Only embed inside the identifiers. 

3.2 Case-sensitive Programming Language 
Sources 

Languages like C are case-sensitive, we can not use 
the above mentioned techniques directly. Off course 
we can hide data inside comments, but what if a 
source does not have a comment at all? Creating 
some arbitrary artificial comments and embedding 
data inside them may not be a good idea. Instead 
we can use the following simple general technique: 

• Use lexical analyzer (some simple scan- 
ner for the language) to find IDEN- 
TIFIER tokens {IDENTIFIER : 
[a - zA - Z_][a - zA - ZO - 9_]+, as 
shown in figure 7). 
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Algorithm 2 Hidden Data Extraction Algorithm 
1: Search for all the html tags present in the 

html stego text and extract all the characters 

did2 ■ . .dn from inside those tags using the DFA 

described in the figure 2. 
2: Extract the secret message length k from inside 

html header in the stego text. 
3: j ^ 0. 

4: for di e HTM LT AGS, i = l...ndo 

5: if di e {'a' ...'z'} then 

6: bj = 0. 

7: j ^ j + 1. 

8: elseifdi e {'A ...'Z'jthen 

9: b, = 1. 
10: j + 1. 
11: end if 
12: if j == k then 
13: break. 
14: end if 
15: end for 



• Only use variable names to embed secret data, 
no function name. Also, stick to local/static 
variables and do not use extern variable names 
to embed), use some simple parser to achieve 
this. 

• If next message bit to embed is 1, change the 
identifier name to append (or prepend) by an 
underscore(_), otherwise leave it as it is (e.g., if 
the variable name is var, change it to var_, if the 
next bit to embed is 1, otherwise keep it as it is, 
while extraction interpret in the same manner). 

• Use some kind of symbol table (hash map) to 
keep track of every change in identifier name 
and accordingly reflect the change to all places 
where the identifier is used. 

• Skip compiler directives / Macros/ keywords. 

• Keep track of total number of bits embedded 
(e.g., store it inside a beginning comment, with 
/ without simple encryption). 



4 Conclusions 

In this paper we presented simple algorithms and 
techniques for hiding data in html text and other 
source codes. This technique can be extended to any 
case-insensitive language and data can be embedded 
in the similar manner, e.g., we can embed secret mes- 
sage bits even in source codes written in languages 
like basic or pascal or in the case-insensitive sections 
(e.g. comments) in C like case-sensitive languages. 
Even for C-like case sensitive languages we can em- 
bed with minimal distortion by tweaking for instance 
the identifier name a little bit. Data hiding meth- 
ods in images results distorted stego-images, but data 
hiding technique proposed for html does not create 
any sort of visible distortion in the stego html text. 
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